A Self-Learning Universal Concept Spotter
نویسندگان
چکیده
We describe the Universal Spotter, a system for identifying in-text references to entities of an arbitrary, user-sl)ecitied type, such its people, organizations, equipment, products, materials, etc. Starting with some initial seed examples, and a training text eortms , I;he system generates rules that will find fllrther concepts of the stone type. The initial se, ed information is t)rovided by the user in the form of a typical lexical context in which the enl, ities to be spotted occur, e.g., "the name ends with Co.", or %o the right of produced or made", and so forth, or by simt)ly supplying examples of the concept itself, e.g., Ford Tau'r'as, gas turbine, Bi 9 Mac. In addition, negative exalnples can t)e supplied, if known. Given a suf[ieiently large training corpus, an unsupervise(t learning process is initiated in which the system will: (1) tind iilstanees of the sought-after concept using the seed-eolltext inforInation while maxiinizing recall and precision; (2) find ,~dditional contexts in which these entities occur; and (3) expand the initial seed-context with selected new com;exts t;o find even lllOre entities. Preliminary results of creating spotters for organizations and products are discussed. 1 I n t r o d u c t i o n hlentifying concepts in natural language text is an important intbrmation extraction task. Depending upon the current information needs one may be interested in finding all references to people, locations, dates, organizations, companies, products, equipment, and so on. These concepts, along with their classification, can be used to index any given text for search or categorization purposes, to generate suimnaries, or to populate database records. However, automat ing the process of concept identification in untbrmatted text has not been an easy task. Various singleImrpose spotters have been developed for specific types of conce.pts, including people mm~es, com'pa.ny n&ines, location names, dates, etc. })lit; those were usually either hand crafted for particular applications or domains, or were heavily relying on apriori lexical clues, such as keywords (e.g., 'Co. ') , case (e.g., ' John K. Big'), predicatable format; (e.g., 123 Maple Street), or a combination of thereof. This makes treat, ion and extension of stleh spotters an arduous mamml job. Other, less s;tlient entities, such as products, equipnmilt, foodstuff', or generic refcrenc.es of any kind (e.g., 'a ,lapanese automaker ' ) could only be i(lentifled if a sut[iciently detailed domain model was available. Domain-model driven extraction wits used in ARPA-sponsored Message Understanding Colltc1'eilc(!s (MUC); a detailed overview of current research can be found in the procecdil~gs ot7 MUC-5 (nmcS, 1993) and the recently concluded MUC-6, as well as Tipster Project meetings, or ARPA's Human Language q>chnology workshops (t ipsterl , 1993), (hltw, 1994). We take a somewh~t different approach to identify various types of text entities, both generic and specific, without a (let, ailed underst, anding of the text domain, and relying instead on a comlfination of shallow linguistic processing (to identi(y candidate lexical entities), statistical knowledge acquisition, unsupervised learning techniques, and t)ossibly broa(1 (mfiversal but often shallow) knowledge, sources, such as on-line dictionaries (e.g., WordNet, Comlex, ()ALl), etc.). Our method IllOVeS t)eytmd the traditional name si)otters and towards a universal spotter where, the requirements on what to spot can be specified as input paraineters, and a specific-purpose spotter c.ouht be generated automatically. In this paper, we describe a method of creating spotters for entities of a specified category given only initial seed examples, and using an unsupervised learning t)rocess to discover rules for finding more instances of the eoncet)t. At this t ime we place no limit on what kind of things one may want to build a spotter for, al@lough our extmriments thus far concentrated on entities customarily re-
منابع مشابه
The effectiveness of psychodrama on social competence and positive self-concept of students with specific learning disorder
Background & Purpose: This study aimed to investigate the effectiveness of psychodrama on the social competence and the positive self-concept of male students with specific learning disorder. Materials and Methods: The research was a quasi-experimental study with a pretest-posttest design with a control group. The statistical population of the study included all male students studying in the...
متن کاملComparing Self-esteem and Self-concept of Athletic and Non-Athletic Students and Finding a Relationship between these two Variables
This study aims to compare and determine a relationship between self-concept and self-esteem of female and male athletic and non-athletic students in Sari branch Islamic Azad University. For this reason, 200 students (100 athletic and 100 non-athletic) were selected randomly and tested by Eysenck’s self-esteem questionnaire and Rogers’s self-concept one. Research findings implied that there is ...
متن کاملپیش بینی عملکرد تحصیلی برحسب خودپنداره، عزت نفس و خودکنترلی در دانش آموزان ابتدایی
Academic performance has been particularly important in today's world and academic self-concept is one of the most important issues in school learning, which has attracted a lot of research implications. Academic self-concept indicate student perceptions or images of her own abilities regarding academic learning that impact on academic achievement, and at the same time it affected. Self-esteem ...
متن کاملThe Impact of a School-Based Social and Emotional Learning Program on the Self-Concept of Middle School Students
This controlled pre-post study investigates whether a universal, school-based, socio-emotional learning program implemented in two consecutive years, would promote an increase in academic, social and emotional self-concept of Portuguese middle school students. It also analyzes if there are differential results by gender and among students with lower self-concept. There were 630 participants (Ma...
متن کاملFDVQ based keyword spotter which incorporates a semi-supervised learning for primary processing
In this paper, we present a novel hybrid keyword spotting system that combines supervised and semi-supervised competitive learning algorithms. The rst stage is a S-SOM (Semi-supervised SelfOrganizing Map) module which is speci cally designed for discrimination between keywords (KWs) and non-keywords (NKWs). The second stage is an FDVQ (Fuzzy Dynamic Vector Quantization) module which consists of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996